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DETAILED ACTION 
EXAMINER'S AMENDMENT 
1 . An examiner's amendment to tlie record appears below. Should the changes 
and/or additions be unacceptable to applicant, an amendment may be filed as provided 
by 37 CFR 1.312. To ensure consideration of such an amendment, it IVIUST be 
submitted no later than the payment of the issue fee. 

Authorization for this examiner's amendment was given in a telephone interview 
with Mr. Richard Hinson on 9/1/2005. The application has been amended as follows: 

Claims 19-28 have been cancelled. 

Claims 1-18 have been amended as follows: 

1 . A method of automatically updating a word database and a pronunciation 
database used by a speech recognition engine to convert speech utterances to 
text, the method comprising: 

taking a realization of spoken audio and a first representation that is an 
allegedly true textual representation for said realization; 

generating a second representation by performing speech recognition on 
said realization using the word database, said second representation being a 
time-based transcription of said realization; 
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expanding said first and second representations to convert each acronym 
and abbreviation contained in said first and second representations to a speech 
equivalent; 

processing the first representation to remove all markup language tags; 

generating a line-by-line output by aligning said first representation and 
said second representation based on timed intervals derived from the time-based 
transcription of said realization, each line matching a segment of said first 
representation and a corresponding segment of said second representation for a 
particular one of the timed intervals; 

detecting and marking each line of output that comprises a one-word 
segment of said first representation and a one-word segment of said second 
representation; 

for each marked line of output whose one-word segment of said first 
representation and one-word segment of said second representation are similar, 
automatically updating said pronunciation database to include said similar one- 
word segments and a corresponding portion of said spoken audio; and 

for each marked line of output whose one-word segment of said first 
representation and one-word segment of said second representation are 
dissimilar, automatically updating said word database to include said dissimilar 
one-word segments and a corresponding portion of said spoken audio. 
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2. The method of claim 1 , further comprising obtaining said first 
representation by optical character recognition using an optical character 
recognition device. 

3. The method of claim 1 , wherein the word database comprises a speaker- 
dependent database used to adapt the speech recognition to a particular 
speaker. 

4. The method of claim 1 , further comprising comparing a recognition quality 
of said speech recognition of said realization with a recognition quality of a 
corresponding single-word entry existing in said pronunciation database. 

5. A method of automatically updating a word database and a pronunciation 
database used by a speech recognition engine to convert speech utterances to 
text, the method comprising: 

taking a realization of spoken audio and a first representation that is an 
allegedly true textual representation for said realization; 

producing a second representation that is a textual representation of said 
realization by performing a speech recognition on said realization using the word 
database; 
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expanding said first and second representations to convert each acronym 
and abbreviation contained in said first and second representations to a speech 
equivalent; 

generating a line-by-line output by aligning said first representation and 
said second representation, each line of said output comprising a segment of 
said first representation, a segment of said second representation, and a time 
indicator indicating a start time and end time of said segments; 

detecting and marking each line of output that comprises a one-word 
segment of said first representation and a one-word segment of said second 
representation; 

for each marked line of output whose one-word segment of said first 
representation and one-word segment of said second representation are similar, 
automatically updating said pronunciation database to include said similar one- 
word segments and a corresponding portion of said spoken audio; and 

for each marked line of output whose one-word segment of said first 
representation and one-word segment of said second representation are 
dissimilar, automatically updating said word database to include said dissimilar 
one-word segments and a corresponding portion of said spoken audio. 
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6. The method of claim 5, further comprising obtaining said first 
representation by optical character recognition using an optical character 
recognition device. 

7. The method of claim 5, wherein the word database comprises a speaker- 
dependent database used to adapt the speech recognition to a particular 
speaker. 

8. The method of claim 5, further comprising comparing a recognition quality 
of said speech recognition of said realization with a recognition quality of a 
corresponding single-word entry existing in said pronunciation database. 

9. A system for automatically updating a word database and a pronunciation 
database, the system comprising: 

an audio device for taking a realization of spoken audio; 

an text reader for taking a first representation that is an allegedly true 
textual representation of said realization; 

a speech recognizer that performs a speech recognition on said realization 
to generate a second representation from said realization, said second 
representation being a time-based transcription of said realization; 
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a word database used by the speech recognizer to perform speech 
recognition tasks; 

an expander that expands said first and second representations to convert 
each acronym and abbreviation contained in said first and second 
representations to a speech equivalent; 

an aligner configured to generate a line-by-line output by aligning said first 
representation and said second representation based on timed intervals derived 
from the time-based transcription of said second representation, each line 
matching a segment of said first representation and a corresponding segment of 
said second representation for a particular one of the timed intervals; 

a classifier configured to detect and mark each line of output that 
comprises a one-word segment of said first representation and a one-word 
segment of said second representation; and 

a selector that for each marked line of output whose one-word segment of 
said first representation and one-word segment of said second representation 
are similar, automatically updates said pronunciation database to include said 
similar one-word segments and a corresponding portion of said spoken audio, 
and for each marked line of output whose one-word segment of said first 
representation and one-word segment of said second representation are 
dissimilar, automatically updates said word database to include said dissimilar 
one-word segments and a corresponding portion of said spoken audio. 
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10. The system of claim 9, wherein the text reader comprises an optical 
character reader. 

11. A machine-readable storage, having stored thereon a computer program 
having a plurality of code sections executable by a machine for causing the 
machine to perform the steps of: 

taking a realization of spoken audio and a first representation that is an 
allegedly true textual representation for said realization; 

generating a second representation by performing speech recognition on 
said realization using the word database, said second representation being a 
time-based transcription of said realization; 

expanding said first and second representations to convert each acronym 
and abbreviation contained in said first and second representations to a speech 
equivalent; 

processing the first representation to remove all markup language tags; 

generating a line-by-line output by aligning said first representation and 
said second representation based on timed intervals derived from the time-based 
transcription of said second representation, each line matching a segment of said 
first representation and a corresponding segment of said second representation 
for a particular one of the timed intervals; 
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detecting and marking each line of output that comprises a one-word 
segment of said first representation and a one-word segment of said second 
representation; 

for each marked line of output whose one-word segment of said first 
representation and one-word segment of said second representation are similar, 
automatically updating a pronunciation database to include said similar one-word 
segments and a corresponding portion of said spoken audio; and 

for each marked line of output whose one-word segment of said first 
representation and one-word segment of said second representation are 
dissimilar, automatically updating a word database to include said dissimilar one- 
word segments and a corresponding portion of said spoken audio. 

12. The machine-readable storage of claim 1 1 , further comprising a machine- 
executable code section to perform the step of obtaining said first representation 
by optical character recognition using an optical character recognition device. 

1 3. The machine-readable storage of claim 1 1 , wherein the word database 
comprises a speaker-dependent database used to adapt the speech recognition 
to a particular speaker. 
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14. The machine-readable storage of claim 1 1 , further comprising a machine- 
executable code section to perform the step of comparing a recognition quality of 
said speech recognition of said realization with a recognition quality of a 
corresponding single-word entry existing in said pronunciation database. 

15. A machine-readable storage, having stored thereon a computer program 
having a plurality of code sections executable by a machine for causing the 
machine to perform the steps of: 

taking a realization of spoken audio and a first representation that is an 
allegedly true textual representation for said realization; 

producing a second representation that is a textual representation of said 
realization by performing a speech recognition on said realization using the word 
database; 

expanding said first and second representations to convert each acronym 
and abbreviation contained in said first and second representations to a speech 
equivalent; 

generating a line-by-line output by aligning said first representation and 
said second representation, each line of said output comprising a segment of 
said first representation, a segment of said second representation, and a time 
indicator indicating a start time and end time of said segments; 
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detecting and marking each line of output that comprises a one-word 
segment of said first representation and a one-word segment of said second 
representation; 

for each marked line of output whose one-word segment of said first 
representation and one-word segment of said second representation are similar, 
automatically updating a pronunciation database to include said similar one-word 
segments and a corresponding portion of said spoken audio; and 

for each marked line of output whose one-word segment of said first 
representation and one-word segment of said second representation are 
dissimilar, automatically updating a word database to include said dissimilar one- 
word segments and a corresponding portion of said spoken audio. 

16. The machine-readable storage of claim 15, further comprising a machine- 
executable code section to perform the step of obtaining said first representation 
by optical character recognition using an optical character recognition device. 

1 7. The machine-readable storage of claim 1 5, wherein the word database 
comprises a speaker-dependent database used to adapt the speech recognition 
to a particular speaker. 
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18. The machine-readable storage of claim 15, further comprising a machine- 
executable code section to perform the step of comparing a recognition quality of 
said speech recognition of said realization with a recognition quality of a 
corresponding single-word entry existing in said pronunciation database. 

Allowable Subject Matter 
2. Claims 1-18 are allowed over prior art of record. The following is an examiner's 
statement of reasons for allowance: Glickman et al. (US 6076059) disclose a method in 
that text segments of a text file are aligned with audio segments of an audio file. The 
text file includes written words, and the audio file includes spoken words. A vocabulary 
and language models are generated from the text segment. A word list is recognized 
from the audio segment using the vocabulary and language model. The word list is 
aligned with the text segment, and corresponding anchors are chosen in the word list 
and text segment. Using the anchors, the text segment and the audio segment are 
partitioned into unaligned and aligned segments according to the anchors. These steps 
are repeated for any unaligned segments until a termination condition is reached (see 
reference). Glickman et al. fail to specifically disclose the step of expanding acronyms 
and abbreviations and removing markup language tags before aligning the first 
representation against the second representation. Glickman et al. also fail specifically 
disclose that for each marked line of output whose one-word segment of said first 
representation and one-word segment of said second representation are dissimilar, 
automatically updating a word database to include said dissimilar one-word segments 
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and a corresponding portion of said spoken audio. Furthermore, it would have not been 
obvious to one of ordinary skill in the art at the time of invention to modify Glickman et 
al. in order to obtain the claimed invention. Therefore, claims 1-18 are allowed over 
prior art of record. 

Any comments considered necessary by applicant must be submitted no later 
than the payment of the issue fee and, to avoid processing delays, should preferably 
accompany the issue fee. Such submissions should be clearly labeled "Comments on 
Statement of Reasons for Allowance." 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Huyen X. Vo whose telephone number is 571-272-7631 . 
The examiner can normally be reached on M-F, 9-5:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Wayne Young can be reached on 571-272-7582. The fax phone number for 
the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 

9/6/2005 HXV 
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PRIMARY EXAMINER 


